Assumptions and Interventions of Probabilistic Causal Models
نویسنده
چکیده
Causality is an intriguing but controversial topic in philosophy, statistics, as well as educational and psychological research. By supporting Causal Markov Condition and Faithfulness Condition, Clark Glymour attempted to draw causal inferences from structural equation modeling. According to Glymour, in order to make causal interpretation of non-experimental data, the researcher must have some type of manipulation, rather than conditioning, of variables. The Causal Markov Condition and its sister, the common cause principle, provide the assumptions to structure relationships among variables in the path model and to load different variables into common latent constructs in the factor model. In addition, the Faithfulness Condition rules out those models in which statistical independence relations follow as a result of special coincidences among the parameter values. The arguments against these assumptions by Nancy Cartwright as well as those for these assumptions by James Woodward will be evaluated in this paper. Causal assumptions 3 Assumptions and Interventions of Probabilistic Causal Models Chong Ho Yu, Ph.D. Introduction Causality is an intriguing but controversial topic in philosophy, statistics, and the social sciences. Since the introduction of Pearson’s Product Moment Correlation Coefficient, many statisticians and social scientists have been conducting research based upon association. For a long time the question about whether quantitative methodologies could lead us to causal inferences has remained unsettled. There are some sound reasons to justify why people are skeptical toward causal inferences yielded by statistical models. Yule (1926) pointed out that sometimes we could get nonsense-correlations between time-series. For instance, if you plot GNP, educational level, or anything against time, you may see some significant correlation. On the other hand, even though bad research studies exist, it does not mean that we should abandon the endeavors altogether. In recent years, both Glymour and his CMU group (Glymour, 1982, 1983; Glymour, Scheines, Spirtes, & Kelly, 1987; Glymour, 1999; Glymour & Cooper, 1999) have been devoting efforts to the TETRAD project in an attempt to affirm causal inferences based upon correlational information and non-experimental data. Not surprisingly, many scholars have voiced either their support or objections to Glymour et al.’s approach. Interestingly enough, numbers per se could not determine whether causal information could be extracted from the data or the mathematical model. Basically, both proponents and opponents of using statistical approach in causality utilize the same numeric information. For instance, structural equation modeling, the causal model endorsed by Glymour and Pearl (2000), is composed of a measurement model and a path model. In a measurement model, Pearson’s Product Moment Correlation Coefficient, which is assocational in essence and a-causal in origin, is used for factor analysis. In addition, today the widely used hypothesis testing by statisticians and social scientists is a fusion of Fisher, Pearson, and Neyman’s models. As mentioned before, Pearson accepted association only and de-emphasized causality. Regardless of whether you believe in causality or not, you may still conduct hypothesis testing, run Pearson’s Correlation Coefficient, and/or do factor analysis, unless you totally reject quantitative methods. If numbers and mathematics alone could not settle the debate of causality, then where could Causal assumptions 4 we go to investigate the problem? I believe that the problem is concerned with the philosophical aspects, such as the unproved assumptions of statistical modeling. In this paper, two major assumptions of Glymour’s TETRAD will be discussed. The arguments against these assumptions by Nancy Cartwright as well as those for these assumptions by James Woodward will be evaluated. Conditioning As mentioned in the beginning, when researchers compute the association of observational (non-experimental) data, sometime the relationships might seem to be nonsense. In order to gain more insights, careful statisticians might partition the data by grouping variables or other lurking variables. This kind of activity can be considered “conditioning.” For example, in a research study regarding the relationship between the birth weight of babies and the age of mothers (an example dataset included in DataDesk, Data Description, Inc., 1999), the regression slope using the full dataset (see the black line and the blue bar in Figure 1) indiciates that as the age of mothers increases, the birth weight of babies increases. This relationship is counter-intutitive because usually as the mother gets older and older, the chance to give birth to a healthy baby is lower and lower. Figure 1. Relationships among birth weight, age of mother, and race However, when the dataset is partitioned by a grouping variable, race, the issue becomes more complicated. The positive relationship between birth weight and age is true among whites. For Causal assumptions 5 blacks, the relationship is negative (see the red line and the red bar in Figure 1), while for other ethnic groups the regression slope is almost flat, and therefore no significant relationship is implied. Please keep in mind that this study is non-experimental for the researcher did not manipulate age, race, and birth. Conditioning this kind of observational data always faces this problem: No broad generalization about relationships could be firmly made because further conditioning and partitioning may reverse the relationship discovered in the aggregate dataset. Intervention and Manipulation According to Meek and Glymour (1994), computing probabilities by conditioning on an event is very different from computing probabilities upon an intervention to bring about that event. While talking about intervention, readers may get an impression that Glymour was talking about conducting experiments, in which human interventions are imposed on various scenarios. Indeed, in Glymour’s view, intervention does not necessarily happen at the data collection stage. At the data analysis stage, data manipulation and model building can also be viewed as a different kind of intervention. Meek and Glymour compared the Fisherian tradition with their own work to show the continuity between both. Fisher’s design of experiment could achieve two objectives: (1) To ensure that treatment assignment has no common causes and are independent if treatment has no effect on outcome; (2) to determine a definite joint probability distribution for treatment and outcome under the assumption of no effect (null hypothesis). On one hand, Fisher’s design of experiment requires randomization of group assignment to rule out common causes. On the other hand, Meek and Glymour asserted that causal claims entail claims about intervention or manipulation. If the research study is not experimental, then how could the logic of the Fisherian school be applied to causal inferences of non-experimental data? Spirtes, Glymour & Scheines (1993) proposed that two assumptions could be employed to bridge the gap between the causal structure and the non-experimental data: the Causal Markov Condition (CMC) and the Faithfulness Condition (FC). In their view, equipped with these two assumptions, researchers could draw causal inferences as if intervention or manipulation had been made to the data. Causal Markov Condition In a causal model, joint probability distribution over the variables must satisfy CMC (Druzdzel & Glymour, 1995). In CMC, each variable is probabilistically independent from its non-descendants, conditional on its parents. In Figure 2, suppose that X1 and X2 are Causal assumptions 6 probabilistically independent from each other, and they both contribute to the effect of X3. X4 is independent from X1 and X2, conditional on X3. If X1 and X2 were not probabilistically independent, the model would be problematic. For example, in a regression model when independent variables are highly correlated, the problem of multicollinearity exists and the model is not interpretable. The Causal Markov Condition is the assumption of the path model, in which relationships among variables are structured. The path model is one of the components of the structural equation model adopted by causationists. The Causal Markov Condition also implies the common cause principle proposed by Reichenbach (1956) and advocated by Glymour and his colleagues (Glymour, 1982; Glymour, Scheines, Spirtes, & Kelly, 1987). According to the common cause principle, if a system of variables satisfies the Markov Condition, and they have a high degree of association, then there exists a latent construct (factor) causing them. The common cause principle is the underlying assumption of the factor model, which is also a building block of the structural equation model. Figure 2. Example of the Causal Markov Condition Faithfulness Condition According to the faithfulness condition, statistical constraints arise from structure, not coincidence. As the name implies, FC supposes that probabilistic dependencies will faithfully reveal causal connections. In other words, all independence and conditional independence relations among observed variables are consequences of the CMC applied to the true causal structure. For example, a research study (cited in Glymour, 1987) indicates that providing Causal assumptions 7 financial aid to released prisoners did not reduce recidivism. An alternate explanation is that free money discourages employment, and unemployment has a positive effect on recidivism while financial aid tends to lower recidivism. As a result, these two effects cancel out each other (Figure 3). However, the faithfulness condition rules out this explanation. Figure 3. Example of the Faithfulness Condition. Manipulation Theorem Meek and Glymour (1994) proposed that when probabilities satisfy CMC and FC, and when the intervention is ideal in the sense of manipulation, casual inferences are legitimate. This notion is termed the “manipulation theorem.” To be specific, given an external intervention on a variable A in a causal model, the researcher can derive the posterior probability distribution over the entire model by simply modifying the conditional probability distribution of A. If this intervention is strong enough to set A to a specific value, the researcher can view the intervention as the only cause of A. Nothing else in the model needs to be modified, as the causal structure of the system remains unchanged. To implement this theorem, Glymour and his CMU group developed a software-plugin named TETRAD to manipulate/intervene on structural equation models by searching all possible paths among variables (manipulation by “what-if”). It is important to note that TETRAD is not something entirely new. Popular structural equation modeling software applications such as Causal assumptions 8 LISREL and EQS have their own automatic path searching algorithms. Nevertheless, Ting (1998) found that the hit rates (the success rates of uncovering the right causal structure) of TETRAD’s automatic search procedure reach 95% for large samples (n=2000) and 52% for small samples (n=200), which are far higher than those offered by LISREL and EQS. Cartwright’s Arguments against Glymour’s Ideas Empiricist view: No causes in, no causes out Many philosophers are opposed to the preceding idea. Due to space constraints, this paper will concentrate on Nancy Cartwight only. Cartwright (1999) emphasized the point of “no causes in, no causes out.” (p. 39) To be specific, there is no way to get casual information from equations and associations. New causal knowledge must be built only from old, empirical causal knowledge. In other words, the empiricist’s rule embraced by Cartwright is that the relevant data are the data that will fix the truth or falsity of the hypothesis, given the other known facts. Glymour et al. included all possible combinations of variables and paths in the model and then irrelevant ones were eliminated. Cartwright questioned that if relevant variables and genuine causes are not included at the beginning, then this elimination approach is useless. For these reasons, Cartwright strongly criticized Glymour et al.’s theory: “Because Glymour, Scheines, Kelly, and Spirtes employ the hypothetico-deductive method, they must proceed in the opposite order. Their basic strategy for judging among models is two-staged: first list all the relevant relations that hold in data, then scan the structures to see which accounts for the greatest number of these relations in the simplest way. That means that they need to find some specific set of relations that will be relevant for every model. But, from the empiricist point of view, no such thing exists.” (p.78) In questioning the applicability of CMC, Cartwright (1999) used a classical example to argue that researchers may take the risk of confusing a co-symptom with a cause: In R.A. Fisher’s opinion, smoking does not cause lung cancer. Rather, smoking and lung cancer are caused by a common cause: a special gene that increases the tendency to smoke and to get cancer. Not surprisingly, Cartwright asserted that to investigate a hypothesis like this, one must conduct a randomized experiment instead of counting on CMC and mathematical intervention of non-experimental data. Actually, Glymour and his CMU group do not rely on equations alone. Rather, they still use empirical data though the data are not non-experimental. It seems that in Cartwright’s view, Causal assumptions 9 non-experimental data are not “empirical” enough. First, it is a well-known fact that most data in astronomy and geology are not experimental, yet many conclusions in these disciplines are qualified to be causal inferences. Second, according to the abductive logic, new knowledge does not necessarily arise from old, empirical knowledge (Yu, Behrens, & Ohlund, under review). Nonetheless, these popular arguments will not be repeated here. Instead, the discussion will be focus on the nature of empirical data. A simple definition of empirical data is data that are collected through sensory input and could be verified by sensory channels or logical means. When observational data of various variables are measured and computed, are their statistical properties empirical? Assuming that the data values of these variables indicate a high degree of internal consistency and a single dimension, and thus these variables satisfy the common cause principle and are collapsed into a single factor, can we regard properties such as “internal consistency” (in psychometric sense, not in logical sense) and “unidimensionaity” empirical? My answer is “yes” because they are absolutely verifiable. Further, assume that I took an IQ test and achieved a score of 200; is the psychometric attribute “high intelligence” empirical? According to strict empiricists, the answer is “no” because the score is not obtained by repeated experiments. Gaining a high score in one single test could be due to pure luck. Right before the test is administered to me, I might take Ginkgo Biloba or read a book carrying IQ test items that are similar to the test. To estimate my IQ score in a scientific manner, I have to retake the same test several times and to demonstrate a high degree of stability of test score over time. However, in many experimental studies subjects are tested or measured just once. In theory, the subjects’ memory about the test should be wiped out so that no carry over effect is present when subjects are retested. Needless to say, it is impossible and unethical to erase people’s memory. Indeed, reliability of many experimental scores is established by mathematical modeling. To be specific, by thought experiment the true score model assumes that if the same person takes the same test over and over, error scores would scatter around the true score, and the observed score is the composite of the true score and the error score. Hence, mathematical models are applied to minimize the error score (Yu, 2001). Please keep in mind that “manipulation” of the test score is carried out during the data analysis. Last but not least, it is doubtful whether objecting that a model may leave out some genuine causes or relevant variables and so rejecting the method could help scientific progress at all. First, Causal assumptions 10 who could affirm that all relevant variables are included in the model except the omnipotent God? Second, is it really necessarily to include all relevant variables? In defense of his standpoint, Glymour (1999) wrote, “Cartwright is perhaps correct that the whole truth about anything is very complex; but, quite properly, science is seldom interested in the whole truth, or aided by insistence upon it. In my view, an inquiry that correctly found the causes of most of the variations in a social phenomenon and neglected small causes would be a triumph.” (p.59) Causal Markov Condition, probabilistic causation, and Simpson’s paradox In addition, while Glymour et al. based their causal modeling on probability, Cartwright (1999) believed that causal laws cannot be reduced to probabilistic laws and thus CMC is questionable. According to Cartwright, “probabilities may be a guide to causes, but they are like a symptom of a disease: there is no general formula to get from symptom to disease” (p.243). Nevertheless, she did not reject CMC altogether. Rather she pointed out that there is not a universal condition that can be imposed on all causal structures. By citing the Simpson’s Paradox (1951), in which the conclusion drawn from the aggregate data is contradicted by the conclusion drawn from the contingency table based upon the same data, Cartwight (1983, 1999b) asserted that universal causal inferences are misleading. The so-called causal relationship is always confined to a particular population. For instance, in England once a 20-year follow-up study was conducted to examine the survival rate and death rate of smokers and non-smokers. The result implied a significant positive effect of smoking because only 24% of smokers died compared to 31% of non-smokers. However, when the data were broken down by age group in a contingency table, it was found that there were more older people in the non-smoker group (Appleton & French, 1996). Based on the Simpson’s Paradox, Dupre and Cartwright (1988) suggested that there are only probabilistic capacities, but no probabilistic causal laws at all. In Cartwright’s view, causal explanation depends on the stability of capacities. In contrast to probabilistic causation that is relative to grouping variables, capacities remain the same when removed from the context in which they are measured. Inconsistent results happen all the time. If we reject probabilistic causation because there is inconsistency, many research projects would become impossible. As a matter of fact, the discovery of Simpson’s paradox does not discourage researchers from drawing generalizations. Instead, different techniques have been employed by statisticians and social scientists to counteract the Causal assumptions 11 potential threat of Simpson’s Paradox. For example, by simulation, Hsu (1989) found that when the sample size is small, randomization tends to make groups become non-equivalent and increase the possibility of Simpson's Paradox. Thus, after randomization with a small sample size, researchers are encouraged to check the group characteristics on different dimensions (e.g. race, sex, age, ...etc.), and re-assignment of group membership is recommended if non-equivalent groups exist. Further, to avo id the Simpson’s Paradox, Olkin (2000) recommended that researchers employ meta-analysis rather than pooling. In pooling, data sets are first combined and then groups are compared. As a result, the conclusion drawn from the combined data set could be misleading, while insights about the research question are hidden in partitioned data. On the contrary, in meta-analysis (Glass, 1976; Glass & Smith, 1981; Hunter & Schmidt, 1990) groups in different data sets are compared first in terms of effect size, and then the comparisons are combined to infer a generalization (Table 1). In other words, the information of partitioned datasets is given consideration first. Table 1. Example of meta-analysis Study ID Experimental group mean Control group mean Effect size Correlation coefficient
منابع مشابه
Probabilistic Computational Causal Discovery for Systems Biology
Discovering the causal mechanisms of biological systems is necessary to design new drugs and therapies. Computational Causal Discovery (CD) is a field that offers the potential to discover causal relations and causal models under certain conditions with a limited set of interventions / manipulations. This chapter reviews the basic concepts and principles of CD, the nature of the assumptions to ...
متن کاملCausality in the Sciences
This paper presents a general theory of causation based on the Structural Causal Model (SCM) described in (Pearl, 2000a). The theory subsumes and unifies current approaches to causation, including graphical, potential outcome, probabilistic, decision analytical, and structural equation models, and provides both a mathematical foundation and a friendly calculus for the analysis of causes and cou...
متن کاملNon-intuitive conditional independence facts hold in models of network data
Many social scientists and researchers across a wide range of fields focus on analyzing a single causal dependency or a conditional model of some outcome variable. However, to reason about interventions or conditional independence, it is useful to construct a joint model of a domain. Researchers in computer science, statistics, and philosophy have developed representations (e.g., Bayesian netwo...
متن کاملHow Causal Reasoning Can Bias Empirical Evidence
Theories of causal reasoning and learning often implicitly assume that the structural implications of causal models and empirical evidence are consistent. However, for probabilistic causal relations this may not be the case. We propose a causal consistency hypothesis claiming that people tend to create consistency between the two types of knowledge. Mismatches between structural implications an...
متن کاملFlattening network data for causal discovery: What could wrong?
Methods for learning causal dependencies from observational data have been the focus of decades of work in social science, statistics, machine learning, and philosophy [9, 10, 11]. Much of the theoretical and practical work on causal discovery has focused on propositional representations. Propositional models effectively represent individual directed causal dependencies (e.g., path analysis, Ba...
متن کاملMissing Data as a Causal and Probabilistic Problem
Causal inference is often phrased as a missing data problem – for every unit, only the response to observed treatment assignment is known, the response to other treatment assignments is not. In this paper, we extend the converse approach of [7] of representing missing data problems to causal models where only interventions on missingness indicators are allowed. We further use this representatio...
متن کامل